On DNA
, RNA
and Protein
sequence molecules, sites can be defined attached to a specific stretch of sequence. Internally, sequence features have the type hierarchy.
Site -> SequenceFeature -> (PolynucleotideFeature,PolypeptideFeature)
To create a sequence feature, call the constructor and add it to a molecule as you would a regular site. Use set_location()
to specify the location of the feature either using (start,end)
kwargs or (start,length)
.
In [1]:
from wc_rules.bioseq import DNA, PolynucleotideFeature
inputstr = 'TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA'
dna1 = DNA(ambiguous=False,id='dna1').set_sequence(inputstr)
feat1 = PolynucleotideFeature(id='feat1').set_molecule(dna1)
feat2 = PolynucleotideFeature(id='feat2').set_molecule(dna1)
feat1.set_location(start=90,end=99)
feat2.set_location(start=90,length=9)
print([x.get_id() for x in dna1.get_sites()])
To get the location of a sequence feature, use get_location()
. The output is a dict
with keys start,end
and int
values.
In [2]:
print(feat1.get_location())
print(feat2.get_location())
Alternatively, the start and end values can be accessed separately using their attribute names or using methods get_start()
and get_end()
.
In [3]:
print([feat1.start,feat1.end])
print([feat1.get_start(),feat1.get_end()])
To get the length of a sequence feature, use get_length()
.
In [4]:
print(feat1.get_length())
print(feat2.get_length())
To read the sequence of a sequence feature, you have to access it using get_sequence()
on the parent molecule.
In [5]:
print(dna1.get_sequence(feat1.start,feat1.end))
print(dna1.get_sequence(feat1.get_start(),feat1.get_end()))
Alternatively, the dict
output of get_location()
can be unpacked and passed to get_sequence()
.
In [6]:
print(dna1.get_sequence(**feat1.get_location()))